Statistical Method of Context Evaluation for Biological Sequence Similarity

نویسندگان

Alina Bogan-Marta

Ioannis Pitas

Kleoniki Lyroudia

چکیده

Within this paper we are proposing and testing a new strategy for detection and measurement of similarity between sequences of proteins. Our approach has its roots in computational linguistics and the related techniques for quantifying and comparing content in strings of characters. The pairwise comparison of proteins relies on the content regularities expected to uniquely characterize each sequence. These regularities are captured by n-gram based modelling techniques and exploited by cross-entropy related measures. In this new attempt to incorporate theoretical ideas from computational linguistics into the field of bioinformatics, we experimented using two implementations having always as ultimate goal the development of practical, computationally efficient algorithms for expressing protein similarity. The experimental analysis reported herein provides evidence for the usefulness of the proposed approach and motivates the further development of linguistics-related tools as a means of analysing biological sequences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A computational method to analyze the similarity of biological sequences under uncertainty

In this paper, we propose a new method to analyze the difference and similarity of biological sequences, based on the fuzzy sets theory. Considering the sequence order and some chemical and structural properties, we present a computational method to cluster the biological sequences. By some examples, we show that the new method is relatively easy and we are able to compare the sequences of arbi...

متن کامل

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...

متن کامل

A Study of Sequence Clustering on Protein’s Primary Structure using a Statistical Method

The clustering of biological sequences into biologically meaningful classes denotes two computationally complex challenges: the choice of a biologically pertinent and computable criterion to evaluate the clusters homogenity, and the optimal exploration of the solution space. Here we are analysing the clustering potential of a new method of sequence similarity based on statistical sequence conte...

متن کامل

Clustering Sequences with a Statistical Content Evaluation Method

متن کامل

On the statistical significance of nucleic acid similarities

When evaluating sequence similarities among nucleic acids by the usual methods, statistical significance is often found when the biological significance of the similarity is dubious. We demonstrate that the known statistical properties of nucleic acid sequences strongly affect the statistical distribution of similarity values when calculated by standard procedures. We propose a series of models...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Statistical Method of Context Evaluation for Biological Sequence Similarity

نویسندگان

چکیده

منابع مشابه

A computational method to analyze the similarity of biological sequences under uncertainty

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

A Study of Sequence Clustering on Protein’s Primary Structure using a Statistical Method

Clustering Sequences with a Statistical Content Evaluation Method

On the statistical significance of nucleic acid similarities

عنوان ژورنال:

اشتراک گذاری